
(b) generating permuted responses of genes by means of Monte Carlo 
randomization of perturbation index for the response of each gene across all 
perturbations; 

(c) performing cluster analysis on the permuted responses of genes; 

(d) determining for each cluster generated in step (c) the fractional improvement 
in the cluster analysis of genes based on the permuted responses of genes, 
wherein said fractional improvement is an improvement in total scatter with 
respect to a cluster center in going from one cluster to two clusters; and 

(e) repeating steps (b) through (d) so that a distribution of fractional 
improvements in the cluster analysis of the genes is obtained for each cluster 
generated by said cluster analysis; 

wherein the statistical significance of each of said sets of co-varying genes is determined by 
comparing the actual fractional improvement for the corresponding cluster to the distribution 
of fractional improvements for the corresponding cluster. 

REMARKS 

Claims 1, 3-50, 58-64, 72-78, 89-100 and 105-124 are pending in the application. In 
the instant Amendment, claims 1,18, 26, 29, 38, 50, 64, 96, 106 and 123 have been amended 
to clarify the present invention. Upon entry of the above-made amendments, claims 1, 3-50, 
58-64, 72-78, 89-100 and 105-124 will be pending. A marked version showing changes 
made to the amended claims is attached hereto as Exhibit A. A clean version of the pending 
claims, as amended, is attached hereto as Exhibit B. 

Claim 1 has been amended to recite that the claimed methods comprise determining, 
for each of a plurality of sets of cellular constituents in a plurality of response profiles, 
whether said set of cellular constituents is upregulated or downregulated by said first 
plurality of drug perturbations, and that the consensus profile for said first plurality of drug 
perturbations comprises measurements of said set or sets of cellular constituents that are 
determined in said determining step to he upregulated or downregulated by said first 
plurality of drug perturbations (emphasis added). Claim 1 has also been amended to clarify 
that each response profile results from a different drug perturbation among said first plurality 
of drug perturbations to said type of cell or organism. Claims 29 and 38 have been amended 
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similarly. Support for the amendments is found in the specification at page 41. lines 21-24 
and page 43. lines 7-3 1 . 

Claim 18 has been amended to recite that in the claimed method the cluster analysis is 
carried out by a hierarchical clustering method: that step (a) involves determining for each 
cluster generated by said cluster analysis an actual fractional improvement in cluster analysis 
of the cellular constituents based on the unpermitted responses of said cellular constituents. 
step (d) involves determining for each cluster generated in step (c) the fractional 
improvement in the cluster analysis of cellular constituents based on the permuted responses 
of cellular constituents, step (e) involves repeating the steps of (b) through (d), i.e., 
generating permuted responses of cellular constituents, performing cluster analysis on the 
permuted responses of cellular constituents, and determining fractional improvement on the 
permuted data, so that a distribution of fractional improvements is obtained for each cluster 
generated by said cluster analysis', that in the claimed method the fractional improyement is 
an improvement in total scatter with respect to a cluster center in going from one cluster to 
two clusters; and that the statistical significance for each of said sets of co-varying cellular 
constituents is determined by comparing the actual fractional improvement for the 
corresponding cluster to the distribution of fractional improvements for the corresponding 
cluster (emphasis added). Claims 64, 96 and 123 have been amended similarly. Support for 
the amendments is found in the specification at page 28. line 28. through page 30. line 20. 
Claims 18, 64, 96 and 123 have also been amended to correct typographical errors. 

Claim 26 has been amended to recite that in the claimed method the cluster analysis is 
carried out by a hierarchical clustering method: that step (a) involves determining for each 
cluster generated by said cluster analysis an actual fractional improvement in the cluster 
analysis of the response profiles, step (d) involves determining for each cluster generated in 
step (c) the fractional improvement in the cluster analysis on the permuted response profiles, 
step (e) involves repeating said steps of (b) through (d). i.e., generating permuted response 
profiles, performing cluster analysis on the permuted response profiles, and determining 
fractional improvement on the permuted data, so that a distribution of fractional 
improvements is obtained for each cluster generated by said cluster analysis: that in the 
claimed method the fractional improvement is an improvement in total scatter with respect to 
a cluster center in going from one cluster to two clusters: and that in the claimed method the 
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statistical significance of each of said sets of response profiles is determined by comparing 
the actual fractional improvement for the corresponding cluster to the distribution of 
fractional improvements /or the corresponding cluster (emphasis added). Claims 50 and 106 
have been amended similarly. Support for the amendments is found in the specification at 
page 28. line 28. through page 30. line 20; and at page 37. lines 17-23. 
No new matter has been added by the amendments. 

APPLICANTS' INTERVIEW SUMMARY 
Applicants thank Primary Examiner Ardin Marschel for the courtesies extended 
during the telephone interview on January 8, 2002 ( hereinafter "the Interview") with R. 
Douglas Bradley, Applicant Yudong He and Applicants* representatives Adriane M. Antler 
and Weining Wang. During the interview, the claim rejections under 35 U.S.C. § 1 12, first 

V 

q paragraph. 35 U.S.C. § 1 12, second paragraph, and 35 U.S.C. § 103(a) were discussed. The 

reference Eisen et al.,1998, Proc. Natl. Acad. ScL USA 95:14863 was also discussed as it 
^ pertains to the claim rejections under 35 U.S.C. § 103(a). 

^ The claim rejections in the instant Office Action under 35 U.S.C. § 1 12, first 



^ paragraph, were first discussed. The Examiner agreed that the printout of a review of S-plus 
^ entitled "S-plus in Teaching" by Henery faxed to the Examiner for review on January 7, 2002 



>* sj> is acceptable to demonstrate that both S-plus and hclust are well known in the art. The 



Examiner indicated that a submission of a total of three references published prior to the 
filing of the instant application by different groups would overcome the rejections. The 



\0 Examiner also indicated that he would accept evidence showing that S-plus is still 

commercially available. Dr. Antler agreed to submit additional references and evidence of 
commercial availability of S-plus in the response to the Office Action. 

The claim rejections in the instant Office Action under 35 U.S.C. § 1 12, second 
paragraph, were then discussed. Dr. Antler explained that the fractional improvement is 
computed with respect to a cluster in going from one cluster into two clusters in a clustering 
tree. Dr. Antler also propose to amend the claims to include such recitation. The Examiner 
indicated that he would reconsider the rejection if such a recitation is in included in the 
claims. 
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The interview participants then discussed the claim rejections under 35 U.S.C. § 
103(a) over Eisen et ah. 1998. Proc. Sail. Acad. Sci. USA 95:14863. Dr. Antler explained, as 
discussed below, that the reference does not make the claimed invention obvious. In 
particular. Eisen does not teach or suggest determining among a plurality of genesets. each of 
those that are upregulated or down regulated by a plurality of perturbations, and using these 
determined genesets as the consensus profile. Dr. Antler proposed to amend the claims to 
this effect. The Examiner agreed to consider such amendment. 

The interview participants also discussed the claim rejections over claims reciting 
projection of cellular constituent sets, e.g., claim 29. Dr. Antler explained, as discussed 
below, that projection is independent of clustering, and thus Eisen's teaching of using 
supervised clustering to obtain clusters of genes teaches or suggests nothing about the 
projection of response profiles or projected response profiles. The Examiner agreed to 
reconsider the rejection. 

The interview participants also discussed the claim rejections over claims reciting 
methods comprising a step of determining the statistical significance of obtained cellular 
constituent sets, e.g., claim 17. Dr. Antler explained, as discussed below, that Eisen does not 
teach or suggest a method comprising a step of determining the statistical significance of 
obtained cellular constituent sets. The Examiner agreed that Eisen does not teach or suggest 
a method comprising a step of determining the statistical significance of obtained cellular 
constituent sets and that the rejections of these claims will be withdrawn. 

CORRECTION OF DRAWINGS 
The Examiner has indicated that Applicants are required to submit drawing 
corrections within the time period set for responding to the Office Action. Applicants submit 
herewith formal drawings consisting of 15 sheets of drawings corresponding to Figures 1-11. 

THE REJECTION UNDER 35 U.S.C. $ 1 12, FIRST PARAGRAPH. 
SHOULD BE WITHDRAWN 
Claims 14, 22, 47, 61, 92 and 1 19 are rejected under 35 U.S.C. § 1 12, first paragraph. 

as containing subject matter which was not described in the specification in such a way as to 

enable one skilled in the art to which it pertains, or with which it is most nearly connected, to 

make and/or use the invention. The Examiner contends that the algorithm of hclust is an 
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essential subject matter for the practice of the above-listed claims and as such cannot be 
enabled by incorporation by reference to a printed publication. Applicants respectfully 
disagree with the Examiner for the reasons set forth below. 

A patent needs not teach, and preferably omits, what is well known in the art. 
Spectra-Physics, Inc. v. Coherent. Inc.. 827 F.2d 1524. 3 U.S.P.Q.2d 1737 (Fed. Cir. 1987). 
Applicants respectfully point out that software package S-Plus which includes hclust is a well 
known and widely used software package for performing statistical analysis and hclust is a 
well known algorithm for performing hierarchical cluster analysis. As evidence that S-plus 
and hclust algorithm are well know n in the art. Applicants submit as Exhibit C a printout of a 
review of S-plus entitled "A Flavour of S-Plus" by Bowman and as Exhibit D a printout of a 
review of S-plus entitled "S-plus in Teaching" by Henery. Henery discloses that since 1989 
S-plus was adopted as the official language for teaching all Statistics courses at University of 
Strathclyde, whereas Bowman discloses the use of S-plus as a teaching medium at University 
of Glasgow. Applicants also direct the Examiner's attention to Weinstein et al., 1997. 
Science 275:343-349, entitled "An information-intensive approach to the molecular 
pharmacology of cancer/' already submitted as reference GO in the Information Disclosure 
Statement filed on October 5. 1999. As evidenced by note no. 21 on page 349. Weinstein 
uses S-plus in its cluster analysis calculations. Furthermore, Applicants submit as Exhibit E 
printouts of w eb pages of Insightful Corp., a vendor of S-plus. demonstrating that S-plus is 
currently commercially available. As can be seen from these references, S-plus and hclust are 
indeed both well known and widely used in the art. Therefore, anyone of skill in the art 
would be readily capable of performing the claimed method of cluster analysis of response 
profile data using the S-plus software and the hclust algorithm. Such a well known algorithm 
is preferably omitted in the specification. Therefore, Applicants respectfully submit that the 
rejection of claims 14. 22, 47, 61. 92 and 1 19 under 35 U.S.C. § 1 12. first paragraph, is in 
error, and should be withdrawn. 

THE REJECTION UNDER 35 U.S.C. § 112. SECOND PARAGRAPH, 
SHOULD BE WITHDRAWN 
Claims 18. 26. 50. 64. 96. 106 and 123 are rejected under 35 U.S.C. § 1 12. second 

paragraph, as allegedly being indefinite. The Examiner contends that claims 18. etc.. are 

vague and indefinite regarding step (a) because in the step a fractional improvement is 
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determined but what is performed in order to obtain such an improvement is not indicated. In 
supporting his above-mentioned contention, the Kxaminer also contends that step (a) 
summarizes what is performed in steps (b)-(d). 

Applicants have amended claims 18. 26. 50. 64. 96. 106 and 123 as described above. 
Applicants further respectfully point out that a fractional improvement is defined at page 29. 
lines 13-26. of the specification as an improvement in total scatter at a particular branch point 
in a cluster tree with respect to the cluster centers in going from one cluster to two clusters. 
Thus, a fractional improvement measures the change ("improv ement"*) in total scatter if a 
cluster is split into two clusters at a branching point. Applicants have amended the claims to 
recite that in the claimed method the fractional improvement is an improvement in total 
scatter with respect to a cluster center in going from one cluster to hvo clusters. An actual 
fractional improvement is defined as the fractional improvement of the unpermuted data, i.e., 
data obtained from cluster analysis of the original data (see specification at page 29. lines 29- 
32). Thus, in the rejected claims, step (a) determines a fractional improvement of the 
unpermuted data; steps (b)-(d) perform permutation of the original data, cluster analysis on 
the permuted data, and determination of a fractional improvement based on the permuted 
data; and step (e) repeats the steps of (b)-(d). i.e., the steps of generating permuted responses 
of cellular constituents, performing cluster analysis on the permuted responses of cellular 
constituents, and determining fractional improvement based on the permuted data, to generate 
a distribution of fractional improvements. Therefore. Applicants respectfully submit that 
claims 18, 26, 50, 64, 96, 106 and 123 as amended are not indefinite, and that the rejection 
under 35 U.S.C. § 1 12, second paragraph, should be withdrawn. 

THE REJECTIONS UNDER 35 U.S.C. § 103(a) 
SHOULD BE WITHDRAWN 

Claims 1. 3-8, 10-13. 15-17. 19-21. 23-25. 27-46, 48, 49. 58-60, 62. 63, 72-78. 89-91. 
93-95,97-100. 105. 107-1 13. 1 15-1 18. 120-122 and 124 are rejected under 35 U.S.C. 
§ 103(a) as being unpatentable over Eisen et al.,1998, Proc. Natl. Acad. Sci. USA 95:14863 
("Eisen"). The Examiner contends that "it would have been obvious to someone of ordinary 
skill in the art at the time of the instant invention to perform the genome-scale expression 
analysis of the reference with the clustering of data in order to determine those sets of genes 
which are affected as to expression by various conditions." The Examiner also contends that 
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"the usage of supervised clustering in the reference as a known reference vector is reasonably 
interpreted as the projected profiles of instant claim 29. for example, when further utilized in 
subsequent analyses as suggested in the references." Claims 1.3-8. 10-13, 15-17. 19-21.23- 
25. 27-46. 48. 49. 58-60. 62. 63, 72-78. 89-91. 93-95. 97-100. 105. 107-1 18, 120-122 and 124 
are rejected under 35 U.S.C. § 103(a) as being unpatentable over Eisen in view of Welsh. 
U.S. Patent No. 5.686.1 14 ("Welsh"). The Examiner contends that "Welsh suggests and 
motivates the issue of the study of drug toxicity along with dosing or treatment in drug 
targeting .... it would have been obvious to someone of ordinary skill in the art at the time of 
the instant invention to also profile drug toxicity along with efficacy in evaluating profiles of 
drug perturbations as instantly claimed." Applicants respectfully disagree with the Examiner 
for reasons set forth below . 

A finding of obviousness under 35 U.S.C. § 103(a) requires a determination that the 
differences between the claimed subject matter and the prior art are such that the subject 
matter as a whole would have been obvious to one of ordinary skill in the art at the time the 
invention was made. Graham v. Deere, 383. U.S. 1 (1956). The relevant inquiry is whether 
the prior art suggests the invention and whether the prior art provides one of ordinary skill in 
the art with a reasonable expectation of success. Both the suggestion and the reasonable 
expectation of success must be found in the prior art. In re Vaeck, 947 F.2d 488 (Fed. Cir. 
1991). 

Eisen teaches cluster analysis for analyzing the genome-wide expression data obtained 
from DNA microarray measurements. In Eisen, a microarray containing essentially every 
ORF from yeast is used to measure gene expression data of budding yeast during the diauxic 
shift, the mitotic cell division cycle, sporulation, and temperature and reducing shocks and a 
microarray with 9,800 cDNAs representing 8,600 distinct human transcripts is used to 
measure gene expression data of primary human fibroblasts stimulated with serum follow ing 
serum starvation. The gene expression data obtained from the microarray measurement are 
then analyzed using cluster analysis to identify gene expression patterns. Eisen suggests that 
genes of similar function cluster together. However, although Eisen teaches that genes can 
co-vary and therefore can be clustered into co-varying sets, Eisen does not teach or suggest 
that some of such co-varying sets of genes or clusters of genes* i.e.. a group of co-varying sets 
of genes, can be upregulated or downregulated by a particular collection of different drug 
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perturbations. Applicants note that genes clustered in different sets are not co-var\ing in 
general (that is why they are clustered in different clusters) and do not in general respond to 
different perturbations similarly. Nor does Eisen teach or suggest that the responses of the 
sets of genes that similarly respond to a particular collection of different drug perturbations 
can be used as the consensus profile for representing the response profiles of a cell in 
response to such a collection of drug perturbations. Eisen" s teaching that genes can be 
clustered into co-varying sets does not provide one of ordinary skill in the art with a 
suggestion and reasonable expectation of success that a group of the co-varying sets can have 
similar responses, i.e., upregulated or downregulated, to a collection of different drug 
perturbations and that such a group of the co-varying sets of cellular constituents can be 
identified and their responses used to represent the response of a cell to the collection of drug 
perturbations, e.g., as the consensus profile of the cell in response to the collection of drug 
perturbations. Thus, Eisen does not teach a method comprising determining, for each of a 
plurality of sets of cellular constituents in a plurality of response profiles, whether said set of 
cellular constituents is upregulated or downregulated by said first plurality of drug 
perturbations, each response profile in said plurality of response profiles (i) comprising 
measurements of a plurality of cellular constituents, and (ii) resulting from a different drug 
perturbation among said first plurality of drug perturbations to said type of cell or organism, 
wherein each set of cellular constituents in said plurality of sets of cellular constituents 
consists of cellular constituents that co-vary under a second plurality of perturbations or that 
are co-regulated, wherein said plurality of response profiles comprises at least five response 
profiles, and wherein said consensus profile for said first plurality of drug perturbations 
comprises measurements of said set or sets of cellular constituents that are determined in 
said determining step to be upregulated or downregulated by said first plurality of drug 
perturbations. 

The Examiner also contends that the usage of supervised clustering in Eisen is 
reasonably interpreted as the projected profiles of instant claims, e.g., claim 29. Applicants 
respectfully submit that the Examiner's contention is erroneous. At the outset. Applicants 
respectfully point out that the projection of response profiles according to a definition of 
cellular constituent sets is independent as to how the cellular constituent sets are defined and 
obtained. Projection of response profiles is carried out after the cellular constituent sets have 
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been obtained by. e.g.. either supervised or unsupervised clustering. The difference between 
unsupervised and supervised clustering is whether predefined reference vectors are used to 
obtain the cellular constituent sets. In contrast, as described in Section 5.3.4. of the 
specification, projection of response profiles involves representing the response profiles in 
terms of the basis cellular constituent sets, w hich are obtained from the cellular constituent 
sets (see Section 5.3.2. for a description of basis cellular constituent sets). Therefore. Eisen's 
teaching of using supervised clustering to obtain clusters of genes teaches or suggests nothing 
about the projection of response profiles or projected response profiles. 

In addition, regarding claims 17. 25, 49, 63, 95, and 122, Eisen does not teach or 
suggest a method comprising a step of determining the statistical significance of obtained 
cellular constituent sets. 

Welsh teaches pharmaceutical compositions comprising an inorganic pyrophosphate 
for use in the treatment of a disease associated with inappropriate or inadequate ATP-binding 
cassette (ABC) protein activity. Welsh does not teach or suggest what is missing in Eisen, 
i.e., that a group of the co-varying cellular constituent sets can have similar responses, i.e., 
upregulated or downregulated, to a collection of different drug perturbations and that such a 
group of the co- varying sets of cellular constituents can be identified and their responses used 
to represent the response of a cell to the collection of drug perturbations, e.g., as the 
consensus profile of the cell in response to the collection of drug perturbations. Welsh does 
not teach or suggest projection of response profiles onto basis cellular constituent sets or 
projected response profiles. Nor does Welsh teach or suggest a method comprising a step of 
determining the statistical significance of obtained cellular constituent sets. 

Therefore, Applicants respectfully submit that neither Eisen nor Eisen in view of 
Welsh renders the present invention unpatentable, and that the rejection under 35 U.S.C. 
§ 103(a) over Eisen and the rejection under 35 U.S.C. § 103(a) over Eisen in view of Welsh 
should be withdrawn. 

CONCLUSION 

Applicants respectfully request entry of the foregoing amendments and remarks into 
the file of the above-identified application. Applicants believe that each ground for rejection 
has been successfully overcome or obviated, and that all the pending claims are in condition 




for allowance. Withdrawal of the Examiner's rejections and allowance of the application are 
respectfully requested. 



Respectfully submitted. 

Date April 9, 2002 /// ■ ( Ht^Lu-^ 32.605 

Adriane M. Antler (Reg. No.) 

PENNIE & EDMONDS i.lp 
1 1 55 Avenue of the Americas 
New York. New York 10036-271 1 
(212) 790-9090 

Enclosures 
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EXHIBIT A: MARKED V ERSION OF THE AMENDED CLAIMS 

U.S. APPLICATION SERIAL NO. 09/220.142 
(ATTORNEY DOCKET NO. 9301-035-999) 



(as amended April 9. 2002) 



1 . (Five Times Amended) A method of determining a consensus profile for a first 
plurality of drug perturbations to a cell type or organism, said method comprising [identifying 
among] determining, for each of a plurality of sets of cellular constituents in a plurality of 
response profiles [one or more sets of cellular constituents], [each of said one or more sets of] 
whether said set of cellular constituents [being] is upregulated or downregulated by said first 
plurality of drug perturbations, each response profile in said plurality of response profiles (i) 
comprising measurements of a plurality of cellular constituents, and (ii) resulting from a 
different drug perturbation among said first plurality of drug perturbations to said type of cell 
or organism, wherein each set of cellular constituents in said plurality of sets of cellular 
constituents consists of cellular constituents that co-vary under a second plurality of 
perturbations or that are co-regulated, wherein said plurality of response profiles comprises at 
least five response profiles, and wherein said consensus profile for said first plurality of drug 
perturbations comprises measurements of said [one or more] set or sets of cellular 
constituents that are determined in said determining step to be upregulated or downregulated 
by said first plurality of drug perturbations . 

18. (Twice Amended) The method of claim 17, wherein said cluster analysis is carried 
out by a hierarchical clustering method, and wherein the objective statistical test comprises: 

(a) determining for each cluster generated by said cluster analysis an actual 
fractional improvement in the cluster analysis of the cellular constituents 
based on the unpermuted responses of said cellular constituents, wherein said 
fractional improvement is an improvement in total scatter with respect to a 
cluster center in going from one cluster to two clusters ; 

(b) generating permuted [response] responses of cellular constituents by means of 
Monte Carlo randomization of perturbation index for the response of each 
cellular constituent across all perturbations; 
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(c) performing said cluster analysis on the permuted [response] responses of 
cellular constituents; 

(d) determining for each cluster generated in step (c) the fractional improvement 
in the cluster analysis of cellular constituents based on the permuted 
[response] responses of cellular constituents , wherein said fractional 
improvement is an improvement in total scatter with respect to a cluster center 
in going from one cluster to two clusters ; and 

(e) repeating [said] steps [of generating permuted response of cellular constituents 
and performing cluster analysis on the permuted response of cellular 
constituents] (b) through (d) so that a distribution of fractional improvements 
in the cluster analysis of the cellular constituents is obtained for each said 
cluster generated by said cluster analysis ; 

wherein the statistical significance of each of said sets of co-varying cellular constituents is 
determined by comparing the actual fractional improvement for the corresponding cluster to 
the distribution of fractional improvements for the corresponding cluster . 

26. (Twice Amended) The method of claim 25, wherein said cluster analysis is carried 
out by a hierarchical clustering method, and wherein the objective statistical test comprises: 

(a) determining for each cluster generated by said cluster analysis an actual 

fractional improvement in the cluster analysis of the unpermuted response 
pr o fi 1 e s , wherein said fractional improvement is an improvement in total 
scatter with respect to a cluster center in going from one cluster to two 
clusters ; 

(b ) generating permuted response profiles by means of Monte Carlo 

randomization of cellular constituent index for each response profile across the 
measured cellular constituents; 

(c) performing said cluster analysis on the permuted response profiles; 

(d) determining for each cluster generated in step (c) the fractional improvement 
in the cluster analysis [on] of the permuted response profiles , wherein said 
fractional improvement is an improvement in total scatter with respect to a 
cluster center in going from one cluster to two clusters ; and 
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(e) repeating [said] steps [of generating permuted response profiles and 

performing cluster analysis on the permuted response profiles] (b) through (d) 
so that a distribution of fractional improvements in the cluster analysis of the 
response profiles is obtained for each said cluster generated by said cluster 
analysis ; 

wherein the statistical significance of each of said sets of response profiles is determined by 
comparing the actual fractional improvement for the corresponding cluster to the distribution 
of fractional improvements for the corresponding cluster . 

29. (Three Time Amended) A method of determining a consensus profile for a first 
plurality of perturbations to a cell type or organism, said method comprising [identifying 
among] determining, for each of a plurality of sets of cellular constituents in a plurality of 
projected profiles [one or more sets of cellular constituents], [each of said one or more sets 
of] whether said set of cellular constituents [being] is upregulated or downregulated by said 
first plurality of perturbations, each projected profile in said plurality of projected profiles 

(i) resulting from a different perturbation among said first plurality of perturbations to 
said type of cell or organism, and 

(ii) comprising measurements of a plurality of cellular constituents in said type of cell 
or organism that have been projected onto basis cellular constituent sets, said basis cellular 
constituent sets being defined by co-variation of measurements of cellular constituents under 
a second plurality of different perturbations, wherein said consensus profile for said first 
plurality of perturbations comprises projected measurements of said [one or more] set or sets 
of cellular constituents that are determined in said determining step to be upregulated or 
downregulated by said first plurality of perturbations . 

38. (Four Times Amended) A method of determining a consensus profile for a first 
plurality of perturbations to a cell type or organism, said method comprising [identifying 
among] determining, for each of a plurality of sets of genes in a plurality of response profiles 
[one or more sets of genes], [each of said one or more sets of] whether said set of genes 
[being] upregulated or dow nregulated by said first plurality of perturbations, each response 
profile in said plurality of response profiles (i) comprising measurements of transcript levels 
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for a plurality of genes, and (ii) resulting from a different perturbation among said first 
plurality of perturbations to said type of cell or organism, wherein each set of genes in said 
plurality of sets of genes consists of genes having transcripts that co-vary under a second 
plurality of perturbations or that are co-regulated, and wherein said consensus profile for said 
first plurality of perturbations comprises measurements of transcript levels for said [one or 
more] set or sets of genes that are determined in said determining step to be upregulated or 
downregulated by said first plurality of perturbations . 

50. (Twice Amended) The method of claim 49. wherein said cluster analysis is carried 
out by a hierarchical clustering method, and wherein the objective statistical test comprises: 
(a) determining for each cluster generated by said cluster analysis an actual 

fractional improvement in the cluster analysis of the unpermuted response 
profiles , wherein said fractional improvement is an improvement in total 
scatter with respect to a cluster center in going from one cluster to two 
clusters ; 

( b) generating permuted response profiles by means of Monte Carlo 

randomization of gene index for each response profile across the measured 
genes; 

(c ) performing said cluster analysis on the permuted response profiles; 

(d) determining for each cluster generated in step (c) the fractional improvement 
in the cluster analysis of the permuted response profiles , wherein said 
fractional improvement is an improvement in total scatter with respect to a 
cluster center in going from one cluster to two clusters ; and 

(e) repeating [said] steps [of generating permuted response profiles and 
performing cluster analysis on the permuted response profiles] (b) through (d) 
so that a distribution of fractional improvements in the cluster analysis of the 
response profiles is obtained for each cluster generated by said cluster 
analysis ; 

w herein the statistical significance of each of said sets of response profiles is determined by 
comparing the actual fractional improvement for the corresponding cluster to the distribution 
of fractional improvements for the corresponding cluster . 
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64. (Twice Amended) The method of claim 63. wherein said cluster analysis is carried 
out by a hierarchical clustering method, and wherein the objective statistical test comprises: 

(a) determining for each cluster generated bv said cluster analysis an actual 
fractional improvement in the cluster analysis of cellular constituents based on 
the unpermuted responses of said cellular constituents, wherein said fractional 
improvement is an improvement in total scatter with respect to a cluster center 
in going from one cluster to two clusters ; 

(b) generating permuted [response] responses of cellular constituents by means of 
Monte Carlo randomization of the perturbation index for each cellular 
constituent across all perturbations; 

(c) performing said cluster analysis on the permuted [response] responses of 
cellular constituents; 

( d) determining for each cluster generated in step (c) the fractional improvement in 
the cluster analysis of cellular constituents based on the permuted [response] 
responses of cellular constituents , wherein said fractional improvement is an 
improvement in total scatter with respect to a cluster center in going from one 
cluster to two clusters ; and 

(e) repeating [said] steps [of generating permuted response of cellular constituents 
and performing cluster analysis on the permuted response of cellular 
constituents] (b) through (d) so that a distribution of fractional improvements 
in the cluster analysis of the cellular constituents is obtained for each cluster 
generated by said cluster analysis ; 

wherein the statistical significance of each of said sets of cellular constituents is determined 
by comparing the actual fractional improvement for the corresponding cluster to the 
distribution of fractional improvements for the corresponding cluster . 

96. (Amended) The method of claim 95, wherein said cluster analysis is carried out by 
a hierarchical clustering method, and wherein the objective statistical test comprises: 

(a) determining for each cluster generated bv said cluster analysis an actual 
fractional improvement in the cluster analysis of the cellular constituents 
based on the unpermuted responses of said cellular constituents, wherein said 
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fractional improvement is an improvement in total scatter with respect to a 
cluster center in going from one cluster to two clusters ; 

(b) generating permuted [response] responses of cellular constituents by means of 
Monte Carlo randomization of the perturbation index for response of each 
cellular constituent across the set of perturbations; 

(c) performing said cluster analysis on the permuted [response] responses of 
cellular constituents; 

(d) determining for each cluster generated in step (c) the fractional improvement 
in the cluster analysis of cellular constituents based on the permuted response 
responses of cellular constituents , wherein said fractional improvement is an 
improvement in total scatter with respect to a cluster center in going from one 
cluster to two clusters ; and 

(e) repeating [said] steps [of generating permuted response of cellular constituents 
and performing cluster analysis on the permuted response of cellular 
constituents] (b) through (d) so that a distribution of fractional improvements 
in the cluster analysis of the cellular constituents is obtained for each cluster 
generated by said cluster analysis . 

wherein the statistical significance of each of said sets of co-varying cellular constituents is 
determined by comparing the actual fractional improvement for the corresponding cluster to 
the distribution of fractional improvements for the corresponding cluster . 

106. (Amended) The method of claim 105, wherein said cluster analysis is carried out 
by a hierarchical clustering method, and wherein the objective statistical test comprises: 

(a) determining for each cluster generated by said cluster analysis an actual 
fractional improvement in the cluster analysis of the unpermuted response 
profiles , wherein said fractional improvement is an improvement in total 
scatter with respect to a cluster center in going from one cluster to two 
clusters ; 

(b) generating permuted response profiles by means of Monte Carlo 
randomization of cellular constituent index for each response profile across 
the measured cellular constituents; 
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(c) performing said cluster analysis on the permuted response profiles; 

(d) determining for each cluster generated in step (c) the fractional improvement 
in the cluster analysis of the permuted response profiles , wherein said 
fractional improvement is an improvement in total scatter with respect to a 
cluster center in going from one cluster to two clusters ; and 

(e) repeating [said] steps [of generating permuted response profiles and 
performing cluster analysis on the permuted response profiles] (b) through (d) 
so that a distribution of fractional improvements in the cluster analysis of the 
response profiles is obtained for each cluster generated by said cluster 
analysis ; 

w herein the statistical significance of each of said sets of response profiles is determined by- 
comparing the actual fractional improvement for the corresponding cluster to the distribution 
of fractional improvements for the corresponding cluster . 

123. (Amended) The method of claim 122, wherein said cluster analysis is carried out 
by a hierarchical clustering method, and wherein the objective statistical test comprises: 

(a) determining for each cluster generated by said cluster analysis an actual 
fractional improvement in the cluster analysis of the genes based on the 
unpermuted responses of said genes, wherein said fractional improvement is 
an improvement in total scatter with respect to a cluster center in going from 
one cluster to two clusters ; 

(b ) generating permuted [response] responses of genes by means of Monte Carlo 
randomization of perturbation index for the response of each gene across all 
perturbations; 

(c) performing cluster analysis on the permuted [response] responses of genes; 

(d) determining for each cluster generated in step (c) the fractional improvement 
in the cluster analysis of genes based on the permuted [response] responses of 
genes , wherein said fractional improvement is an improvement in total scatter 
with respect to a cluster center in going from one cluster to two clusters ; and 

(e) repeating [said] steps [of generating permuted response of genes and 
performing cluster analysis on the permuted response of genes] (b) through (d) 
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so that a distribution of fractional improvements in the cluster analysis of the 
eenes is obtained for each cluster generated bv said cluster analysis : 
wherein the statistical significance of each of said sets of co-varvina izenes is determined by 
comparing the actual fractional improvement for the correspondinu cluster to the distribution 
of fractional improvements for the corresponding cluster . 
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A Flavour of S-Plus 



by Adrian Bowman 
University of Glasgow 



This article appears in the February* 1993 issue of the newsletter Maths&Stats, as part of a 
special S-Plus supplement. It is based on a talk given at the S-Plus Workshops in 1992 . 



In introducing any new package, it is probably easiest to contrast it with a package which is 
already well known, such as MINITAB which is widely used in teaching Introductory 
Courses throughout the country. Something of the style of S-Plus is indicated by the 
following statements. 



The effect of the first two of these statements is to read a file of data, referring to blood 
measurements of a group of workers and reported by Royston (Applied Statistics 1983, 32, 
121-33) in a matrix called d. Afficionados of MINITAB might instantly claim that this is 
much more cumbersome than MINITAB's simple 'read' command. The first S-Plus 
statement stores all the data as a long string of numbers in the vector x. The second 
statement then reorganises this by filling up a matrix d in seven columns along the rows. 
One advantage of this is the flexibility of being able to handle files where records are spread 
perhaps irregularly over several lines of a file. These commands also indicate another 
feature of S-Plus which is that everything is done through the use of functions. The 
assignment operator <- takes the result of the function and stores it in the object on the left 
hand side. The functional nature of the language means that statements can be combined as, 
for example, 



The third original statement, pairs(d), illustrates one of the main strengths of S-Plus. The 



Figures 



x < - scan ("haem.dat") 

d < - matrix (x, ncol = 6, byrow = T) 

pairs (d) 



d < - 



matrix (scan ("haem.dat"), ncol 



6 , byrow 
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result of this function is to take the columns of the matrix d and produce all the pair-wise 
scatterplots which can be created from these columns. The resulting picture is displayed in 
Figure 1 . S-Plus is extremely good at producing high quality graphics which, given the right 
equipment, can be sent to a laser printer at the press of an electronic button. 

As a further example of this, the following command produces a dendrogram as displayed in 
Figure 2 : 



plclust (hclust (dist (d) , method = "connected")) 

Here the parameter method = "connected" selects the use of single link in clustering. After 
using packages w here the standard output from a cluster analysis is a long list of numerical 
information, this form of high quality graphics is a delight. S-Plus is particularly strong in 
multivariate methods and has routines for a wide variety of techniques. Most of these 
graphical methods are extremely easy to use. As a package for giving students relatively 
painless experience of more advanced techniques, S-Plus is an extremely useful teaching 
medium. 

The role of packages in allowing students to carry out data analysis and to provide an 
environment within which to model data is probably the most important one. There are 
however other uses of the computer in teaching. One of these is to take basic ideas or 
techniques and allow students to explore the meaning and properties of them. This is where 
the fact that S-Plus is a full blooded programming language comes into its own. It is very 
easy to string together basic S-Plus commands and to create these as a new function. Figure 
3 illustrates the output of a function designed to take 100 samples of size 25 from a normal 
population with mean 69 and display the resulting computed confidence intervals. This 
simple graphical illustration communicates the fact that confidence intervals are random 
intervals and that confidence comes from the fact that, on average, 95% of these intervals 
will capture the true value. Simulations can be repeated by calling the function as many 
times as required. In the past it has been necessary to write special purpose software to 
produce this kind of graphical illustration but the presence of packages with sophisticated 
graphics and programming facilities now mean that we have the advantage of being able to 
do this kind of exercise within an environment which has all the standard tools at our 
disposal. 

Figure 4 gives some output from a function written to display repeated measurements data. 
With this type of data observations are repeated on individuals across time. The data here 
refer to the levels of leutimizing hormone in two groups of cows, reported by Raz 
(Biometrics (1989) 54, 851-71). It is a very commonly occurring data structure but one 
which many data packages find difficult to handle. Some years ago, I wrote a programme in 
BBCBasic on a pc w hich took up several pages of code. The function to produce these 
repeated measurements plot takes up less than two pages of S-Plus code and is far more 
flexible and powerful. 
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The acid test of any particular package is whether it is used in practice. I can say that I am 
now a wholehearted and enthusiastic S-Plus user although it is not yet available within my 
own department as a teaching vehicle. I use it routinely for research and consulting 
problems. As previously indicated, its strengths lie in the high quality nature of the graphics, 
the easy access to some sophisticated and powerful modelling tools, and the flexible nature 
of the programming environment. 

With any package there will always be disadvantages. Until recently there were some 
surprising omissions in the area of elementary techniques and of designed experiments. This 
has to a large extent been remedied in version 3 of the package. The documentation is not 
yet ideal as the result of the history of the development of the software but this is also 
improving. One area which is still handled rather patchily is that of missing data. Some 
functions will cope with this whereas others will not operate when missing data are detected. 
Despite these disadvantages, S-Plus is an extremely powerful and sophisticated package. It 
was designed for UNIX systems and works extremely well on Sun workstations, although it 
can be slow if 'for' loops are employed. It is now available for DOS too, and John Hinde has 
reviewed this in an article. 



Adrian Bowman 
adrian(a stats.gla.ac.uk 
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Figure 1 
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Figure 2 
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Figure 4 
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STATISTICS 



S-Plus in Teaching 



by Bob Henery 
University of Strathclyde 



This article appears in the February 1993 issue of the newsletter Maths&Stats, as part of a 
special S-Plus supplement. It is based on a talk given at the S-Plus Workshops in 1992 . 



Background | Developments in Splus | Documentation | Help - functions and data | Data Manipulation | 
Graphical Procedures I Interfaces to C/FORTRAN I Statistical Procedures I Teaching | Library I Equipment I 
Comparison with MINITAB. GLIM. ... 

Background 

For several years, I had been giving a 20-hour course to Final Year Honours students at 
Strathclyde University. The course covered non-normal data such as contingency tables, 
Generalised Linear Models, canonical variates, etc. with the computing side based on 
MINITAB and GLIM. At least half of the time was spent on theory, partly because no 
suitable introductory text was available for many years, although the situation has improved 
with the appearance of the second edition of McCullagh and Nelder (1990). 

However, with the installation of a laboratory of 20 Sun workstations in October 1989, 
S-plus was adopted as the 'official' language for teaching all Statistics courses, (not only 
Final Year), and this required a complete revision of the content and format of the course. 
To make best use of the strong points of S-plus, modern statistical methods combined with 
excellent graphics, it was decided to give the minimum of theory and to concentrate on 
demonstrations, practicals and commentaries on the examples provided. The advantage was 
that the students could get some experience in using a large number procedures, the 
disadvantage being that each topic was covered superficially. 

Developments in Splus 

Splus is a modem statistical package: it is constantly being improved by the addition of new 
procedures and its basic structure also undergoes an evolutionary process. Each new version 
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brings with it a greater range of functions and greater power in the ability to process classes 
of datasets. Improvements in performance are not always monotonic however, and Splus 
version 3.0 was noticeably slower than Splus 2.3, a situation that was rapidly corrected with 
the introduction in October 1992 of version 3.1 which is best of all models to date. Make 
sure therefore, when ordering Splus, that the version number is 3.1 or later. 

Originally the course was based on Splus 2.2 (October 1989), but that was updated to Splus 
2.3 (June 1990) and thereafter to Splus 3.0 (June 1992). It will be necessary to update to 
version 3.1 soon. Each version entails some re-learning of the system for teacher and 
students alike. 

Documentation 

There are now three text-books dealing with the language S: 

1. "The New S Language" by Becker, R., Chambers J.M. and Wilks A.R. (1988).; 

2. "Statistical Models in S", eds. Chambers J.M. and Hastie, T.J. (1992); 

3. "Data Analysis by Using S" by Sibuya, M. and Shibata, R. (1992). 

The first is more concerned with programming; the second deals extensively with a few 
selected statistical models and their treatment in S; and the third is aimed more at students. 

The S-PLUS User's Manual is also a very useful guide to statistical and graphical 
procedures, but it is probably not suitable for students who will want to use the online help 
facility on a narrow range of facilities. 

Help - functions and data 

Online information is available on both procedures and datasets using the help() command. 
The command help(plclust), for example, creates a window containing a complete 
specification of the plclust procedure, just as it appears in the New-S book. Of particular 
importance for teaching purposes is the liberal use of examples in the help files, which may 
be copied directly to the student's command window, and executed without further ado since 
they mostly refer to the provided datasets, of which there are several. Another good feature 
is that the help documentation contains very informative summaries of the theory and 
applications of important statistical procedures. By sending appropriate help files to the laser 
printer, each student may prepare for himself a customised set of notes, incuding worked 
examples. Another useful feature is that brief descriptions of the datasets are also given, for 
example on Fisher's iris data by help(iris). 

Data Manipulation 
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Most S is done by assigning expressions. Objects in S may be of type logical, numeric, 
character (alphanumeric), and structures may be vectors, matrices, or lists. To give some feel 
for how S looks in practice, here is a complete example (given in help(plclust)) for 
generating a hierarchical clustering plot. 

# the example plot is produced by: 
suntools ( ) 

sums <- apply (author . count , 1 , sum) 
adjusted <- sweep (author . count , 1 , sums , "/" ) 
par (mar=c (18,4,4,1) ) 

plclust (hclust (dist (adjusted) ) , label=dimnames (author . count ) [ [1] ] ) 
title ( "Clustering of Books Based on Letter Frequency") 

Even complete novices can run the above example by clicking the mouse buttons and 
copying and pasting. It only remains to motivate the procedure (hierarchical clustering) and 
to give a commentary on the output, the graphical display of which is shown as Figure 1 . 
Incidentally, the command used to add the caption "Figure 1 " to the graph is 



text (locator (1) , "Figure l",cex=2.0) 



the caption being superimposed at the cursor when the mouse is clicked. Unfortunately, 
what you see on the graph is not always what you get on the laser printer. However, this 
difficulty is readily overcome by trial and error, and with almost no effort at all students are 
producing high-quality graphics for their reports. 

The dimnames function used in the above example is another useful feature. Names of 
variables or objects (in the above example, names of books and authors) may be used to 
label graphs or to identify outliers. Dimnames are propagated through many procedures, 
with the result that residuals can be identified readily with the associated object or variable. 



Graphical Procedures 

These are excellent, with a range of modem procedures, such as brush and spin (which 
students enjoy). If required full control may be exercised over the format and content of all 
graphs, by labelling, adding lines or text etc. Once completed, graphs may be sent to a laser 
printer or Hewlett-Packard plotter. 



Interfaces to C/FORTRAN 



Interfaces to C and FORTRAN are possible, but for teaching purposes these are best hidden. 
However, course organisers who find that their favourite procedure is not supplied, may 
wish to code it themselves in FORTRAN say, and it may then be interfaced to S-plus. There 
are several restrictions on such FORTRAN routines, so some rewriting of procedures is 
almost certain. More probably, any deficiencies may be rectified by writing new procedures 
in S. 
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Statistical Procedures 



There are three sources for S-plus procedures. Standard S-plus procedures are provided by 
Statistical Sciences Inc. who also give user support for these procedures. Because the 
standard S-plus functions do not include simple tests, a suite of programs called NESI (New 
Environment for Statistical Inference) have been provided by Prof. Shibata and colleagues at 
Keio University, Japan. The NESI functions are now included in the Splus package. Finally, 
the odd user- supplied function may be spotted in the mailing list s-news or at Statlib (see 
the Statistics Resources on the Web pages for more information). 

S-plus has far too many functions to list, so here is a representative list of functions new to 
that version of Splus: 



S-plus 2.2 hclust - hierarchical clustering 

mstree - minimum spanning tree 

discr - discrimination 

prcomp - principal components 



S-plus 2.3 



glim - generalised linear models 

ace - alternating conditional expectation 

avas - additive nonlinear regression with variance 

stabilization 
ppreg - projection pursuit regression 
nlmin - non- linear minimization 



S-plus 3.0 glm - generalised linear models with factors, 

interactions , etc . 
var.test - F variance ratio test 
t.test - one or two sample t test 

wilcox.test - Wilcoxon signed rank or Mann-Whitney test 



Teaching 

From the outset, the approach was to introduce the students to as many important modem 
techniques as possible. S-plus is strong on exploratory data analysis, so at times the 
theoretical treatment was at best sketchy, and even non-existent. However, by going over to 
100% course- work assessment, a modular approach was possible by which each topic could 
be taken in isolation. Each topic, which might involve two or three related S-plus 
procedures, was dealt with by some introductory lectures and demonstrations followed by a 
course- work assessment involving a min-project plus report, the whole topic occupying two 
weeks. In writing up the report, the emphasis was on the procedures in question and not on 
the particular dataset, and a full analysis of the data was not required. In this way many more 
procedures were discussed than formerly, and although the drawback is that the treatment 
was more superficial, yet the students enjoyed the course more, and felt that it was useful to 
know about as many procedures as possible. However, some students felt that more lectures 
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would have been useful. 

Some of the newer S-plus procedures will be unfamiliar to the lecturers, never mind the 
students, and some have not stood the test of time. So, as an experiment, a couple of 
procedures were introduced with no preparatory lectures, the idea being to see how the 
students would cope with no aid from the lecturer, relying solely on their natural intelligence 
and the help facility. This experiment was only moderately successful, (was the fault in their 
intelligence or the help facility?), but I will try again next year! 

Library 

Collections of procedures may be gathered together for a specific course or a suite of 
programs for a specific purpose may conveniently be placed in a library. For example, Brian 
Yandell has developed a suite of programs under the heading 'penalised likelihood 
generalised linear models 1 , and these were put in a library (pglm). Procedures for my own 
course, which included a customised version of discr for multivariate analysis of variance 
and a specially written procedure for correspondence analysis, were placed in library(da2). 
Also in library(da2) were instructions for projects, and the writing of reports etc. 

With Splus version 3.0 and later it will be particularly easy to construct a suite of procedures 
dealing with classes of datasets. 

Equipment 

The teaching lab for the course now consists of 20 Sun Sparcstations 1+: the older 3/80 
workstations were just a little on the slow side. The workstations are served by a central Sun 
386i, and have access to a QMS810 laser printer and Hewlett-Packard 7550A graph plotter. 
Typically there are 15-20 students in the class, and occasionally there is a log-jam at the 
printer when graphs and help files are in great demand, but a modicum of discipline smooths 
the problem away. 

S-plus requires a Unix environment, and this means that a workstation is advisable. 
Although a PC version is now available, I believe it requires a considerably beefed-up PC 
with lots of memory and a fast disc, and I do not know the relative merits of the Sun and PC 
versions. 

Comparison with MINITAB, GLIM, ... 

The S-plus package has been reviewed by D.G. Fraser, who compares the scope and 
complexity of S-plus with SAS, BMDP or Genstat (in Bull. Inst. Maths. Applic. 26, no. 3). 
It is certainly not as easy to learn as MINITAB, nor is it so powerful as GLIM, but the 
combination of high-quality graphics and modem statistical procedures make it very 
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attractive for the research worker or as the basis of a Final Year or Postgraduate course in 
applied statistics. For lower level courses, that probably require only basics such as t- or 
F-tests, there is no good reason for changing to S-plus other than the very good graphics. 

Bob Henery 

bob@stams.strath.ac.uk 
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at Lucent Technologies' Bell Labs specifically 
for data visualization, exploration and pro- 
gramming with data. The S System has been 
recognized with the prestigious Association 
for Computing Machinery Software Systems 
Award, (other recipients include UNIX, 
TCP/IP and Mosaic). With more than 4.200 
statistical, graphical and programming 
functions built in, you can create applica- 
tions in S in a fraction of the time it would 
take using lower-level languages like C or 
Visual Basic. The S language is platform- 
independent, so your applications will run 
on either Windows or Unix. 



Data insight at your fingertips. 

S-PLUS s intuitive graphical user interface offers (he look-and feel of Microsoft Office, 
applications making it easv for \ou to access and analyze data. Embed graphs into Word or 
PowerPoint presentations with point -and-click ease and share Your results with key decision makers. 

Microsoft Excel integration improves data import and transfer capabilities. 

You can open Excel worksheets within S-PLUS. perform analyses and create graphics direct Iv from 
your data. Since your data stays in Excel, you won't spend time transferring results back and forth 
between Excel and S-PLL'S. 

Easily import and export data from Oracle, SAS, SPSS and other standard formats. 

S-PLUS makes it easy to access data from virtually any source including Excel, SAS, SPSS and data 
bases including Oracle, Sybase and SQL Server. S-PLUS offers extensive import and export 
capabilities to help you move data from one file format to another. 




Easily import and export data from the following formats 



■ gas 

■ :;p$:. 

■ 1 xce- 

■ "ext (ASCII) 

■ Uua'trc Fio 

■ Paradox 

■ Lotu » * ■2-3 

■ c6a:e 

■ ! iicj" a Plot 



■ Gystat 

1 Gauss 
1 Access 
• MATLAB 

■ LIM 

■ El I on ^ berg 

■ FAME 

■ Minitdb 



FoxPro 

[pi In'o 

In'omix 

Oracle 

Sybase 

SQL Server 

ODBC 



Export graphics in the following fomats 



■ A'rvjows BitrapfBMPl 

■ fmapsjlated PostScript i.BPS) 

■ So-ouServe CGIF) 

■ GEN* Bt'-apt'lMG! 

■ JPEii(.JEG) 

■ Acohe Photoshop ( PSD) 
Acjo:k; SPjF.i 



■ HP Printer Control Language 

■ PaintBrush f PCX] 

■ Gagged .~age for sat : "IF] 

■ 'rue Vision 'arga (GGAj 

■ Windows Meta'ile (.WMFl 
Portable Network Graphics i.PKC- 



PCD 
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Solution for Data Analysis 



S-PLUS Graph lets" offer real-time 
Web-based interactive graphics. 



Share your discoveries using 
interactive graphics. 

S P I I S graphics are object-oriented so 
vou can customize even level of detail 
to create the perfect graphic for your 
presentation S PLUS lets you interact 
with vour data and identify unusual 
values or select relevant subsets. And 
S-PIA'S s hi?w Graph lets™ technology 
allows vou 10 publish your graphics on 
the Wei) and give vour readers l he 
opportunity to interact with vour data 
in real time. 



Inroduting a new graphics format that 
allows you to add interactivity directly 
into graphics published on the Web. 
S-PL US Graphlets offer the flexibility of 
the S-PL US graphics engine to create exact- 
ly the graph vou want, and then make it 
dynamic by allowing the viewer to drill- 
down into your data or create hyperlinks 
fivm your data points to other informa- 
tion or graphics located elsewhere on the 
Web. Examples of S-PL US Graphlets can be 
found at w ww.insightful.com graphlets. 



Visualize multi-dimensional 
data using Trellis graphics. 

S-PLUS is the oniv data analysis package to offer Trellis graphics, a revolutionary way to visualize rela- 
tionships in inuki dimensional data. Developed by researchers at Bell Labs. Trellis graphics help you 
discover hidden relationships in vour data by computing graphical views sliced on one or more condi- 
tioning variables. No other graphing technique has as much power and flexibility. 

Select from more publication-quality graphics and formats. 

SPLUS offers an extensive selection of 2D and 3D graph types. From histograms to bar charts to 
scatterplots. the graphics library in S PLUS is truly comprehensive. Easily customize your graphs 
including line weights, colors and fonts for publication-quality graphics. 



NEW S-PLUS Graphlets 

\ .v ,'ou ■ an ;r-;a:e interactive gr 

.v^fe vieMv.ii.an drill-down into 'iv graphic 

•■: -:\gm in-'orra'ion or linked Web ;ia';es. 





i£ 
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Visualize Your Data With S-PLUS 



NEW 1 Fxcel Link Now y:\, ,-i j;v 
woKS'ieots fror. :nn S-PLUS Per To 
analyses and crt\-:e ojcI-cjiiO'vcjj;; .v, 
ics easily. 



On Unix and Linux platforms, you can no 1 /.' access the powerful 
statistical and gramme; techniques of S-PLUS tnrougn a gracrncal 
user interface. Easily import and export your data, run statistical 
analyses, and create revealing grapns. all tnrough convenient 
menus ;and dialogs like those available on Windows For program- 
ming, an interactive commands window gives you access to all the 
power and flexibility of tne S language;. 



S-PLUS - GS1 



File Edit View Insert Format Data Statistics Graph Options Window Help 



B 

AreaName 

Birmingham 
Phoenix 
Hot Springs 
Los Angeles 
San Francisco 
Denver 
New Haven 
Jacksonville 
{ 10 Miami 
E 11 Atlanta 
': 12 Boise City 
ii 13 Chicago 
■j 14 Indianapolis 
[h 4 ► M\data/ 

Contents o( Data 



State 

AL 
AZ 
AR 

i:a 
ca 

CO 

CT 

FL 

FL 

GA 

ID 

IL 

IN 



□ Data 

Graphs 

Q Reports 

Q Scripts 

V* SearchPath 

+ V D:\temp\derno 

+ V splus 

+ V stat 

+ V data 

+ V trellis 

+ V nlme3 

+ V menu 

+ V sgui 

+ V wmsp; 



D 

Population 

27751D 
894070 
36930 

3259340 
749000 
505000 
123450 
609860 
373940 
421910 
108390 

3009530 



E F 
PopDens Pct.wh 

2781 43. 

2384 

1578 

6996 
16142 

4728 

6532 
803 
10902 

3216 

2562 
13194 



84 i 
8.) 
61. | 
58 ! 
74 
52 
72 
Bl 
32 
96 
49 



Li L£ l£ Ll 

ballsiiw Li 
tie tii lib I* 

& IS m 



Li- 

L_|n 

itl kS [MLS 

ILiE- V 

ll ll m 



> summary (f 
tileigh 
Kin. :18 
\ 1st Qu. : 25 
' Median: 28 
j Hean:Z9 
! 3nd Qu. : 32 
j Max.: 38 



8000000 
7000000 
80000011 
5000000 
4000000 
3000000 
200000C; 
1000000 



« J [ F H^d spline Surface 



if & 




Surface - Filled, Spline Fine Grid (xj y, z or rl . ,2n) 
*»art| 'Jl £ ij » JjUh.J 



an... 



£*Re.. 



%Co.,.[ X j X sail 



Change the Details of Your Graph With 
Ease Pcnt-and-clicK control over r.ery detail 
of your graphs nruMes it easy to p-- ■'uce stun- 
ning ouohcation-quahty outout S" nqc Ine 
.'.Ch'i's. axes colons .ajeis fcn; ; j. ,y-n:jc' 
"vpes and nvj^o wi;n oaM; 
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- 19 ; X; 



near - & ft ."i V j 4ft 



C x 



fuel, ii?-*'- 

it Disp. 

Mm.: 73.0 
ft Q U . : 110. 8 
Median: 14-4. 5 
He an: 1 S2. 1 



345 
571 
335 
901 
231 
355 



M-ix . : 



Mileage Fuel Type 

Km. : 18.00 Mm. : 2 . 707.703 Compact: 15 

1st Qu. : 21.00 1st Qu. : :i . 70:704 Large; 3 

Med;an: 23.00 Medial.: 4. 34"826 

Mean: 24. 58 Mean: 4. 210033 



id iju.: 1*0.0 3rd Qu.: 27.00 3rd Qu.:4. 761905 



5.0 



Max. : 3T.00 



Max. : 5. 555556 



Medium: 13 
Small: 13 
Spotty: 9 
Van: 7 



Completely Customize the User Interface 

Change toolaars, menus and dialogs :o suit 
your wording st/le. Add and delete options 
■wito case 



.nix 



F'opuij'.ons of US : ties 



■I ISl-Ein^ll.ul 



Mil 



' H " j " : i ! r * { 5 » i s ' 1 s 1" 1 ' » • 



illj lii'ilMl 



Modified 



vj[S.. 



lo...: :^5-... a][-o... ^un...' ^ O^iJ 11:25 



Produce Publication-Quality Output 

Analyze, visualize and present your data using 
over 80 2D and 3D graph ryxs. 



Yosemite National Park Vegetation Map 




Automate Repetitive Tasks With Powerful Scripting Capabilities 

A,, >['LU'j eoer ■'ions art 1 re ;i. '■■•■(] in scripts wn.cn can be saved a ic exo- 
cutec 'o autame'e repetitive : ■ ,-.->. Drag-and-drao objects into a sen:;: 

cmrr.anbs. and create you: 



vvndcj.v :c mstf itly generate ' -PLUS interface r 
own a Jttons jV Iraqqmc y:;ur ..cnots onto be to> 
vvitn your celloacn les ay gyric tnem your too I jar 



jar bnare your wort 
".d sen;): files 
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S-PLUS starts where your 
spreadsheet leaves off. 

Spreadsheets are great for entering and 
organizing data, but if you re also trying 
to analyze your data with a spreadsheet 
you re potentially missing \aluahle 
insights by limiting yourself to 
inadequate methods. With S PU S 6 you 
get the best of both worlds — just open 
your Excel worksheet from within S-PLUS, 
select a region of data you w ish to ana- 
lyze, and instantly gain access to all of 
S-PLUS s advanced statistical and 
graphical functionality applied to your 
Excel data. 



none ha:, approached >' ] \ iJ ■ 
DM Review 



Perform analysis using the most comprehensive 
solution available today 

S PLUS of Tors over 4,200 built-in data analysis functions 
including modern and classical techniques. 
Convenient menus, toolbars and dialogs let vou access and 
analyze vour data easily. 



Choose from the most comprehensive set of modern and robust methods 
available. 

Tap into the power of linear and nonlinear regression, generalized linear models, generalized additive 
models, tree models, smoothing splines, survival analysis, time series, cluster analysis, robust 
methods, analysis of data with missing values, multiple comparisons and much, much more. 




The S language gives you control 
over your data. 

Precise, in-depth analysis may require exten- 
sive data manipulation and cleansing. With 
the S anguagc at the core of S-PLUS you can 
easily prepare your data for analysis and 
graphing. From data transformations to data 
cleansing and validating data integrity, the S 
language provides the tools to get the job done 
efficiently, leaving vou with a script file of 
transformations made to validate your actions. 



Select the model that can provide you with the best results easily. 

With the object-oriented S-PLL'S environment, all functions, data and fitted models are handled as 
objects. This allows vou to fit alternative models using both classical and modern methods so you can 
be confident the model vou have selected will deliver the best results. 




Create or extend analysis methods to meet your specific needs. 

S-PLUS offers a powerful programming language allowing you to create or extend analyses. As your 
analyses become more complex, S-PLUS can be extended to meet the challenge. Tap into the power, 
flexibility and extensibility of S-PLUS to take your analyses to the next level. 



Cutting-edge methods available as Web downloads. 

S-PLL'S is the environment of choice for researchers world-wide developing advanced statistical 
methods for new problems in the world of data analysis. New S-PLUS functions and programs 
are available for download from third-party websites or from our own S-PLUS Community forum. 
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Create Powerpoint presentation slides with a click of a mouse. 

S- PLUS graphics can be embedded in standard Microsoft Office products from PowerPoint slides to 
Word documents. Translate your results into professional looking reports using S PLUS'S unique 
graphic capabilities. 



Total control over your results. 

With S-PLL'S you re not limited to pre defined templates for your results. The S language gives you the 
freedom to present your results in any way vou need: graphically, as free-form text, or as Web ready 
HTML tables. 



Export your graphs into versatile formats for Web 
or print presentations. 

Export publication-quality S-PLl'S graphs to popular graphical 
formats including PostScript, GIF. PDF. and Windows MetaFile. 
Create interactive graphics on your web pages using Java-based 
S PLUSCraphlets™. 



S-PLUS: Power where you need it 

S-PLU C - has a completely open interface, allow- 
ing it '■) t>o integrated intc virtually any system 
Rogula" analyses cat- be automated usirg 
S-PLU -s batch processing syntem employing 
too flexibility of '.he S comma-id language On 
Unix s. stems, S-PLUS's CONNECT.'Java 
Interface allows S-PIUS to be integrated wth 
any Ja/a application On Windows, the CON- 
NECT'^ Interface allows you to access 
S-PIU'. s ;omclcte range of analytic methods 
from ( applicsticns you develop And 
S-PLUVs DDE and OLE Automation interfaces 
allow vou to integrate S-PLUS with other 
Widows applications, allowing you to access 
j-PLU j functionality within Excel or from 
Visual Basic applications 



Business Benefit 

• Leverage Statistical Expertise 

• Deploy Analyses Quickly and Easily 

• Access Analyses Through the Web 



1 By laureling a Web browser, employees 
can enrch data that analysts at headquar- 
ters once had to do" 
Business Week 



Distribute Analytics Enterprise-wide 

Today s aeosion-maxers need access :o real-time analysis of nusness data tc antiuaate ris-c 
predict customer oc^avior and caxalize en e-erq.nq mar<et oonertuneies hs;gTful s 
clien:, server nroducts can provide decision makers wen uo-to-toe-mioute results en tne:r des*- 
:<u :)ased en analytic or grapmcai metnods using S-PLUS Witn a custom-created application 
desiq'K'd exaressly for tne aroniem at nand decision majors gam msignt from tneir data sooo- 
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S-PLUS Comprehensive Feature List 



SlAflSHCAL s MJMLklCAi 
Hi.HMU.Jl S 
Basic Statistics 



Nonparametric Regression 



GRAPHICS H VISUALIZATION 
Plot Type Highlights 



Time Series Analysis 



Interconnectivity 



Hypothesis Tests and Confidence 
Intervals 



Smoothing 



Linear and Nonlinear Mixed- 
Effects Models 



Regression 

■ Po'V'lOn'k); irqrO'A.;>!l 

■ ■■'■k'a:' q^q-iyq :> 

■ I rjiynety spirit' 'y >qys 



- (.ooysyi^yqyv.on 

- I iKh^.'C yqyssyn 

! egression 

- ^iotny: MM 'oqvv.iL>' i 

Analysis of Variance 

• f !H*;t ie spfy tiyit .in ot var ! 
itevct yis yy; nq 



■ i'ies qno;: tyneryyoy one 

yitwuinceu 

■ Variance- exponent est-ratio! 

■ Muit py u'-nipjinsons Fyne' 
"ckey ;)i;n iutt. Sicmv Honferr; 
:j:;he*ff s r : .;,jt on ts.isec 

Nonlinear Regression and 
Maximum Likelihood 

■ rvn ■K'ii- yq r; y-y 



Resampling 



Multivariate Analysis 

• i.yn.j'K-) ro ,r r <!' O'l 

• J '\;n: ( n;i ys s 
■ hrt;H yiays.s 

• Mu' K !'jn -jti^io'itii sca : -nq 



Cluster Analysis 

■ MonotiH'tiC Clustering 

■ Mi.nV'' p-yeP "'lIn'*"-' nq 

» t. r s'; ?r\r \,77\ : ,ytC n;j 

» :) v v.-p ;nci nqq i.^cn: v 
methods 

Quality Control 

Ljsum chart 



Power and Sample Size 

No r mai 'Tiffin 

Survival Analysis 



k-t rant, ant: oyy- 



Date, Time, and Calendar Data 



■q : req. ' yi -yo nq 



t y vfty in i le-na 
. • n t 7'.»'v>s v qn cyv 1 qT s 0 , ny- 

■ \ lex - o yy air Ciitf t y r-y:t 

• \ yy j'i c.y 'v y- 

• r.e a" Vt" Ii'i't; V it- 1 SeCyyiCC d'lu 

(vea oP|fcy 

Mathematical Computations 

• Vectr yu: r T- jy y coyputJDons 

■ Mar : < cccompo' ifons 



■ r,on ne.li t:.p* i:n 'at.un 

" i.oiyy nop opti yznvon 

Robust Methods 

■ \cv. : .: r-q ;. e> j'i yy< < :>y ' ' 

■ ■ Jov\ ;}lt its *o r ;n:iyr Petection .-m^ 

parryns py^^g 11 ] 

Missing Data Library 

• Mu '-j>f' r-p.ia: on 

■ CrHJ yyn iogyP ;yinc! conCiPon.-.i. 
Gau yyn muJeq 

Large Data Set Support 

» Menorv ryq p>n j 

■ RRft r en :;f; count nq 



Vdnat) *: selection on .y;x.ct 
S.qyn-* yr <yyyn: yxe^ tj 



H-M-'y nq* - 



Advanced Data Visualization 

■ i >;:i...-, y i fi ^ q'.nn-f-. f y : rip: 



• J p-0|{'[ : orr in 3 D sp t y.e 

■ N\, ; t y-: 1 pyi - 'jf p;;y-j /■ '\\ 

T Kin 

■ :n:yy.'yf }Li-^ r Vd*::_-n jent ( 
« ijutphu'ts 

Customizability and Editing 

■ I I f * X = t. if: pj: ]6 i r/OUt 

• Ovf 1 - > !v rj"..:ph' iv. display s.do Py 

■ 'y j ' fy o* ny .vid sv^jO ^ 

• Lonu ii o-./( r i f yyie. '"ia ; ",t; r ty|)e 
';:v;r. iapt-y k im; t < t- : 

■ '.y..: py iiGf.q Lyq<."i* y slu t-.: x 
.][!(.! y u:<0S 

■ V.qt pi-; i: .e t yt annotj: yi 

Import and Export 

■ mpoft :jAI: . S-'SS Fxcei. M.itJ.ib. 
■ i r i(! 'itl'er t Ie "onnats 

■ X<y v C) r ;i( o iyp^se. lnfo'n:ix 
]nd ODBC PasfN 

■ -xport qrai'hir - as PDF. po- tscnpt. 
y HFK-I 

PROGRAMMING & 
EXTENSIBILITY 
Object-Oriented Language 

■ Use*, the o iject :)"f:-iH:G S 

■ O.y -:00 p.. * u ( ./iy on - 

■A-Titf! r K.w functions 

• R ci t a '■ tyyj'fs .ny j::o 



MS Office Integration (Windows) 



■ y>-L S F'l .Sqy-i-y- .n 
anij re i tiicn -n oiace 

• yri) c 10 5: P: a^l-'Cs to 
: 'yite ^ PI ijs ci', ; 3ns frotn ■.vitnin 
£ r oPSb 

User Contributed Code 

■ I ipia ie : i as.ux:i iti'O vvitn the Pook 
Mew, Aieinr .:r,?;yfy- .vq' 1 

■ Hnusc i- in dos.qri o^nts for bio 
yato.t c.ri and i pi j-;yol(.-q!C n-oci 



Help and Documentation 



:;qnnt ind o -i t : noyVif 



60-DAY MONf V-BA( K 
GUARANTEf 

Call now to discuss your business 
needs 



Insightful 
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